Goto

Collaborating Authors

 visualizing and measuring


Visualizing and Measuring the Geometry of BERT

Neural Information Processing Systems

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces.


Reviews: Visualizing and Measuring the Geometry of BERT

Neural Information Processing Systems

Originality: This submission uses existing techniques to analyze how syntax and semantics are represented in BERT. The authors do a good job of contextualizing the work in terms of previous work, for instance similar analyses for other models (like Word2Vec). They also build off of the work of Hewitt and Manning and provide new theoretical justification for Hewitt and Manning's empirical findings. Quality: Their mathematical arguments are sound, but the authors could add more rigor to the conclusions they draw in the remarks following Theorem 1. The empirical studies show some interesting results.


Visualizing and Measuring the Geometry of BERT

Neural Information Processing Systems

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces.


Visualizing and Measuring the Geometry of BERT

Reif, Emily, Yuan, Ann, Wattenberg, Martin, Viegas, Fernanda B., Coenen, Andy, Pearce, Adam, Kim, Been

Neural Information Processing Systems

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces.